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(54) Abstract Title 

Representing and searching for colour images 

A method of representing a colour image comprises selecting a region of the image, selecting one or 
more colours as representative colours for the region and, for a region having two or ™ re ^V™**"™™ . 
rTo?ours calculating for each representative colour at least two parameters related to the colour distribution n 
^^n"^^!^ representative colour and using said parameters to derive descriptors for the image 

re9 '° n Alternatively a function approximating the colour distribution for each representative colour or an 
indication of the spread of the colour distribution of each representative colour can be used to derive 

d ^T^t^^^ -ri be represented using these descriptors which can be searched by comparing 
the descriptors with that of a user selected image. 
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# #$49460 

Method and Apparatus for Representing and Searching for Colour Images 

The present invention relates to a method and apparatus for 
representing a colour image or a region of an image for searching purposes, 
5 and a method and apparatus for searching for colour images or image regions. 

Searching techniques based on image content for retrieving still 
images and video from, for example, multimedia databases are known. 
Various image features, including colour, texture, edge information, shape and 
motion, have been used for such techniques. Applications of such techniques 
10 include Internet search engines, interactive TV, telemedicine and 
teleshopping. 

For the purposes of retrieval of images from an image database, 
images or regions of images are represented by descriptors, including 
descriptors based on colours within the image. Various different types of 

15 colour-based descriptors are known, including the average colour of an image 
region, statistical moments based on colour variation within an image region, 
a representative colour, such as the colour that covers the largest area of an 
image region, and colour histograms, where a histogram is derived for an 
image region by counting the number of pixels in the region of each of a set of 

20 predetermined colours. 

A known content-based image retrieval system is QBIC (query by 
image content) (see US 5579471, MPEG document M4582/P165: Colour 
Descriptors for MPEG-7 by IBM Almaden Research Center). In one of the 
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modes of operation of that system, each image in a database is divided into 
blocks. Each block is grouped into subsets of similar colours and the largest 
such subset is selected. The average colour of the selected subset is chosen as 
the representative colour of the respective block. The representative colour 
information for the image is stored in the database. A query in the database 
can be made by selecting a query image. Representative colour information 
for the query image is derived in the same manner as described above. The 
query information is then compared with the infonnation for the images 
stored in the database using an algorithm to locate the closest matches. 

MPEG document M4582/P437 and US 5586197 disclose a similar 
approach, but using a more flexible method of dividing an image into blocks 
and a different method of comparing images. In another variation, described 
in MPEG document M4582/P576: Colour representation for visual objects, a 
single value for each of two representative colours per region are used. 

Several techniques for representing images based on colour histograms 
have been developed, such as MPEG document M4582/P76: A colour 
descriptor for MPEG-7: Variable-Bin colour histogram. Other techniques use 
statistical descriptions of the colour distribution in an image region. For 
example, MPEG document M4582/P549: Colour Descriptor by using picture 
infonnation measure of subregions in video sequences discloses a technique 
whereby an image is divided into high and low entropy regions and colour 
distribution features are calculated for each type of region. MPEG document 




M4852/P319: MPEG-7 Colour Descriptor Proposal describes using a mean 
and a covariance value as descriptors for an image region. 

All the approaches described above have important shortcomings. 
Some of them, in particular colour histogram techniques, are highly accurate, 
but require relatively large amounts of storage and processing time. Other 
methods, such as the ones using one or two representative colours, have high 
storage and computational efficiency but are not precise enough. The 
statistical descriptors are a compromise between those two types of 
techniques, but they can suffer from lack of flexibility, especially in case 
where colours of pixels vary widely within a region. 

The present invention provides a method of representing an image by 
approximating the colour distribution using a number of component 
distributions, each corresponding to a representative colour in an image 
region, to derive descriptors of the image region. 

The invention also provides a method of searching for images using 
such descriptors. 

The invention also provides a computer program for implementing 
said methods and a computer-readable medium storing such a computer 
program. The computer-readable medium may be a separable medium such 
as a floppy disc or CD-ROM or memory such as RAM. 

An embodiment of the invention will be described with reference to 
the accompanying drawings of which: 
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Fig. 1 is a block diagram of a system according to an embodiment of 
the invention; 

Fig. 2 is a flow chart of a first search method; and 
Fig. 3 is a flow chart of a second search method. 
A system according to an embodiment of the invention is shown in 
Fig. 1. The system includes a control unit 2 such as a computer for 
controlling operation of the system, a display unit 4 such as a monitor, 
connected to the control unit 2, for displaying outputs including images and 
text and a pointing device 6 such as a mouse for inputting instructions to the 
control unit 2. The system also includes an image database 8 storing digital 
versions of a plurality of images and a descriptor database 10 storing 
descriptor information, described in more detail below, for each of the images 
stored in the image database 8. Each of the image database 8 and the 
descriptor database 10 is connected to the control unit 2. The system also 
includes a search engine 12 which is a computer program under the control of 
the control unit 2 and which operates on the descriptor database 1 0. 

In this embodiment, the elements of the system are provided on a 
single site, such as an image library, where the components of the system are 
permanently linked. 

The descriptor database 10 stores descriptors of all the images stored 
in the image database. More specifically, in this embodiment, the descriptor 
database 10 contains descriptors for each of a plurality of regions of each 
image. The descriptors are derived as described below. 
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Each image in the database 8 is divided into a number of non- 
overlapping rectangular blocks of pixels. For each block, a colour histogram 
is then derived, by selecting a predetermined number of colours, and counting 
the number of pixels in the block of each colour. 

The colour histogram so obtained shows the colour distribution of the 
pixels within the block. In general, the region will have one or more 
dominant colours, and the histogram will have peaks corresponding to those 
colours. 

The descriptors for the blocks are based on the dominant colours as 
identified from the histogram. The descriptor for each block has the following 
elements: 

(1) The number of dominant colours, n, called the degree of the 
descriptor, where n ^ 1 ; and 

for each dominant colour: 

(2) (a) a weight representing the relative significance of the respective 
dominant colour in the block. Here, the weight is a ratio of the number of 
pixels in the block of the relevant colour to the total number of pixels in the 
block. 



(b) a mean value, m 



m x 



Vm 2 y 



where x, y and z index colour components, for example the red, green and 
blue colour components of the colour in RGB colour space. Here, the mean 
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value corresponds to the colour components of the respective dominant 



colour. 



(c) a covariance matrix C - 



C y* c yy c yz 



where c a represents variance of colour component i and c y represents 
covariance between components i and j. The covariance matrix is 
symmetrical ( Cij = Cjj ) so only six numbers are needed to store it 

In obtaining the descriptor as discussed above, the colour distribution 
is treated as n different sub-distributions, where n is the number of dominant 
colours, each sub-distribution centring about a respective dominant colour as 
the mean. The ranges of the sub-distributions may well overlap, and a 
suitable algorithm is used to determine the range of each distribution for 
calculating the weight, mean and covariance matrix, as will be understood by 
a person skilled in the art. One way of estimating the descriptor components 
is to fit Gaussian functions centred at histogram peaks to the histogram by 
minimising the difference between the actual histogram counts and values 
estimated from the mixture of Gaussian functions. 

The descriptor database 10 stores a descriptor as defined above for 
each block of each image stored in the image database 8. The representation 
of the colour distribution within each block using the descriptor structure 
described above contains a large amount of descriptive information, but 
requires less storage space than, for example, full histogram information. 




As an example, a colour histogram for a specific block may exhibit 
three peaks corresponding to three dominant colours. The histogram colour 
distribution is analysed as three colour sub-distributions and results in a 
descriptor including the number three indicating the number of dominant 
5 colours, three weights, three mean vectors, corresponding to the colour 
vectors for the three peaks, and three corresponding covariance matrices. 

The system is used to search for images in the image database using 
the descriptors stored in the descriptor database. The present embodiment 
provides two search methods: a single colour based search and a region based 
10 search. 

The single colour based search will be described with reference to the 
flowchart shown in Fig. 2. 

In the single colour based search, the user inputs a query by selecting a 
colour to be searched, using the pointing device 6 and a menu such as a colour 
15 wheel or a palette displayed on the display unit 4 (step 102). The control unit 
2 then obtains the corresponding colour vector for the query colour, the colour 
vector having components which are the respective colour components for the 
query colour, that is, the red, green and blue components (step 104). 

The control unit 2 then uses the search engine 12 to search for images 
20 in the image database 8 that include the query colour. The search engine 12 
performs a matching procedure using the query colour vector and the 
descriptors for the image blocks in the descriptor database 10 (step 106). 
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The matching precede is period using the foHowing formula for 
calculating a matching value M. 

M= exp [- j (q - m)V (, . m)j 

where q is U« query colour vector. A matohing value is calculatod tor each 
dominant colour in each block using each value of m and C in me descriptor 

for the block. Thus, for a descriptor of degree n, n matoning values am 

obtained. 

Tlte matching value can be considered as the value of the probability 
density function corresponding to each colour sub-distribution in the block a, 
me point defined by the query colour value, modelling me probability density 
firnction as a Gaussian function. 

For a given descriptor, me larger a matching value M, me closer me 
corresponding block is to a match with the selected colour. 

When matching values have been calculated for each descriptor in the 
descriptor database 1 0, the search engine 12 orders the results by the size of M 
storting with the largest values of M, considering only foe largest value of M 
for any descriptors of degree greater than one (step 108). 

The control unit 2 Wees foe results of foe matching prccedum fi™ to 
search engine .2, and retrieves from foe image database a predetermined 
number K of those images which am foe closes, matches, corresponding to fo. 
K highest values of M. Those images are men displayed on foe display unit 4 
(stop 110). The set-up of foe control ^ 2 detennines how rf fc 




closest matches are to be displayed on the display unit That number can be 
changed by the user. 

As will be understood from the above description, the single colour 
based search retrieves images from the image database 8 which have a block 
which has a dominant colour which is the same as or close to the colour 
initially selected by the user. 

The region based search will be described with reference to the 
flowchart shown in Fig. 3. 

In the region based search, the control unit 2 operates to display a 
predetermined set of search images, which are images from the image 
database 8, on the display unit 4 (step 202). The search images may be 
wholly determined by the set-up of the control unit, or may depend on other 
requirements input by the user. For example, in a larger system supporting 
keyword-based searches the user might input the word "leaves" which would 
result in a predetermined set of images depicting leaves being shown as the 
images for colour based search. 

Each of the search images is shown with a grid dividing the image into 
blocks, corresponding to blocks for which the descriptors have been derived. 
The user then selects, using the pointing device 6, a block on one of the 
images which shows a colour distribution of interest (step 204). 

The control unit 2 then retrieves the descriptor for the selected image 
block from the descriptor database 10 and uses it as a query descriptor (step 
206). The descriptor is already available because the search images are taken 
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from the image database 8. The search engine then performs a search 
comparing that query descriptor with the other descriptors stored in the 

descriptor database using matching functions (step 208). 

For a query descriptor having a mean value m a and covariance matrix 

C a for one of the dominant colours and another descriptor having a mean 

value m b and covariance matrix C b for one of the dominant colours, a 

matching function is defined as: 

». U» - Jexp[-I (q . mJ T c; , (q . m . ) ] exp ^ (q . mb)Tc . I (q . mb) j dq 

where q is a 3-d vector akin to a colour vector and where the integral is 
calculated over the range from (0, 0, 0) to (255, 255, 255) where 255 is the 
maximum value of a colour component. The range of the integral in other 
embodiments will depend upon the colour co-ordinate system and 
representation used. 

This is equivalent to modelling the corresponding colour sub- 
distributions for the image blocks as probability mass functions in the form of 
Gaussian functions, and determining the degree to which they overlap, or in 
other words determining the similarity between them. The larger the result of 
the above calculation, the closer are the corresponding colour distributions. In 
this case, the function determines the degree to which a colour sub- 
distribution in the query image block and a colour sub-distribution in a stored 
image overlap. 




The full matching function for matching one descriptor with another is 
defined as: 

m f = Z v i w i m sft J) 

where v and w are weights for sub-distributions, and the summation is over all 
sub-distributions in both regions. 

Thus, for each dominant colour described in the descriptor of a query 
image block, a matching value is calculated with respect to each dominant 
colour in a descriptor from the descriptor database 10. The resulting matching 
values are weighted and then summed to give a final matching value 
corresponding to m f . 

Full matching values are calculated as described above for all 
descriptors in the database with respect to the query descriptor. As in the 
single colour based search, the results are ordered (step 210), and the K 
images with the highest matching values, indicating the closest matches, are 
displayed on the display unit for the user (step 212). 

A further iteration of a search can be performed by selecting an image 
region in an image found in the previous search. 

A system according to the invention may, for example, be provided in 
an image library. Alternatively, the databases may be sited remote from the 
control unit of the system, connected to the control unit by a temporary link 
such as a telephone line or by a network such as the internet. The image and 



12 



descriptor databases may be provided, for exaffiple> „ ^ ^ 

on portable data storage media such as CD-ROMs or DVDs. 

to the above description, the co, 0 ur representations have been 
describ^in«ern K of re d, g reenandbl„ec„,„ U rc„ mponents . of course, other 
' »P~««» can be used, such as a representation using a hue, saturation 
and intensity, or YUV connate ^ „ . ^ rf ^ 

in any colour space, for example only hue and saturation in HSI. 

The embodiment of the invention described above uses descriptors 
derived for rectangular blocks of images. Otiter sub- regio „s of the image 
could be used as the basis for the descriptor For ^ tf 

different shapes and sizes could be used. Alternatively, descriptora may be 
derived for tegions of the image corresponding to objects, for example, a car, 
a house or a person. In either case, descriptors may be derived for all of the 
image or only part of it 

In the search prccedure, instead of inputting . simple mlom quay m 
selecting an image block, «he user can, for example, use me pointing device ,„ 
describe a region of an image, say, by encircling 1, hereupon tire control 
unit derives a descriptor for ma. region and uses 1, for seining in a similar 
manner as described above. Also, instead of using Images already su^ in 
the image diabase for initiating a search, an image could be input into «he 
s^em using, for example, an image scanner or a digital camera. In order to 
peribnn a search in such a situation, again the system fim derives descriptora 




for the image or regions of the image, either automatically or as determined 
by the user. 

Appropriate aspects of the invention can be implemented using 
hardware or software. 

In the above embodiments, the component sub-distributions for each 
representative colour are approximated using Gaussian functions, and the 
mean and covariances of those functions are used as descriptor values. 
However, other functions or parameters can be used to approximate the 
component distributions, for example, using basis functions such as sine and 
cosine, with descriptors based on those functions. 



CLAIMS 



1. A method of representing a colour image comprising selecting a 
region of the image, selecting one or more colours as representative colours 
for the region and, for a region having two or more representative colours, 
calculating for each representative colour at least two parameters related to the 
colour distribution in relation to the respective representative colour and using 
said parameters to derive descriptors for the image region. 

2. A method as claimed in claim 1 wherein the parameters are 
statistical values related to the colour distribution in relation to the respective 
representative colour in the region. 

3. A method as claimed in claim 1 or claim 2 comprising storing 
said descriptors in data storage means. 

4. A method as claimed in any one of claims 1 to 3 wherein the 
step of selecting representative colours comprises deriving a colour histogram 
for the region. 

5. A method as claimed in claim 4 wherein the step of selecting 
representative colours comprises identifying local peaks in the colour 
histogram and selecting the corresponding colours as representative colours. 



6. A method as claimed in claim 5 dependent on claim 2 wherein 
the local peaks are treated as mean values for colour distributions within the 
region and said statistical values are the first two central moments of the 

5 respective distribution. 

7. A method as claimed in any one of claims 1 to 6 wherein the 
image region is independent of the image content. 

10 8. A method as claimed in claim 7 wherein the image region is a 

polygon. 

9. A method as claimed in any one of claims 1 to 6 wherein the 
image region corresponds to an object. 

15 

10. A method of representing a colour image by processing signals 
corresponding to said image, the method comprising selecting a region of the 
image, identifying a number of representative colours for the region, and, for a 
region having two or more representative colours, deriving a function 

20 approximating the colour distribution corresponding to each representative 
colour, and using said functions to define colour descriptors of said region. 




11. A method of representing a colour image by processing signals 
corresponding to said image, the method comprising selecting a region of the 
image, identifying a number of representative colours for the region, and, for a 
region having more than one representative colour, deriving for each 
representative colour an indication of the spread of the colour distribution in 
relation to the representative colour and using said indication to derive 
descriptors for the image region. 

12. A method of searching for colour images stored in data storage 
means comprising inputting a query relating to colour of an image, comparing 
said query with descriptors for stored images derived in accordance with a 
method as claimed in any one of claims 1 to 1 1 using a matching function and 
selecting and displaying at least one image for which the matching function 
indicates a close match between the query and at least part of the image. 

13. A method as claimed in claim 12 wherein inputting a query 
comprises selecting a query image region and obtaining descriptors for said 
image region in accordance with a method as claimed in any one of claims 1 
to 1 1 and wherein the matching function uses the descriptors for the query and 
for the stored images. 




14. A method as claimed in claim 12 wherein the matching 
function is based onM= exp ^ (q - m) T C ! (q - m)J 

where q is a colour vector corresponding to a query and m and C are 
descriptor values representing first and second central moments of the colour 
5 distribution for a representative colour. 

15. A method as claimed in claim 12 wherein the matching 
function is based on 

m f = 52v s Wjm, (i, j) 

u 

10 where 

m^b)^ J«p[~(q-mJ^^ 

and where m and C are descriptor values representing first and second central 
moments for colour distributions for representative colours. 

15 16. A method as claimed in any one of claims 12 to 15 wherein a 

query is selected from a plurality of images displayed on display means. 

17. A method as claimed in claim 12 wherein inputting a query 
comprises selecting a single colour value. 
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18. A method as claimed in claim 12 wherein inputting a query 
comprises specifying one or more component distributions. 

19. A method as claimed in claim 12 wherein a query is input 
5 using only some of the components of the colour space. 

20. An apparatus for implementing a method according to any one 
of claims 1 to 19. 

10 21. A computer system programmed to operate according to a 

method as claimed in any one of claims 1 to 19. 

22. A computer program for implementing a method as claimed in 
any one of claims 1 to 19. 

15 

23. A computer-readable medium storing computer-executable 
process steps for implementing a method as claimed in any one of claims 1 to 
19. 
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24. A method for searching for colour images by processing 
signals substantially as hereinbefore described with reference to the 
accompanying drawings. 




25. A computer system substantially as hereinbefore described 
with reference to the accompanying drawings. 
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